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Observations  are  recorded  on  variables  x  and  y  but  a  mechanism,  which 


may  depend  on  the  observed  x  values,  causes  some  of  the  y  values  to  be 


missing.  For  three  parametric  examples,  exact  or  approximate  ancillary 


statistics  jure  constructed.  Conditioning  on  these  ancillaries  enables  the 


missing  data  mechanism  to  be  ignored  under  certain  conditions.  A 


correspondence  is  shown  between  these  conditional  procedures  and  the  use  of 


the  observed  information  matrix  in  measuring  the  dispersion  of  the  maximum 
likelihood  estimator.  '-i  '  *  -  • 
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SIGNIFICANCE  AND  EXPLANATION 


Many  statistical  problems  can  be  viewed  as  missing  data  problems.  Data 
may  be  missing  for  practical  reasons,  out  of  the  control  of  the  data- 
collector,  or  missing  by  design,  such  as  in  sample  surveys  where,  for  some 
variables,  data  is  available  for  the  whole  population  but,  for  other 
variables,  data  is  recorded  for  the  sample  and  is  'missing1  for  the  remainder 
of  the  population. 

Rubin  (1976)  has  shown  that  the  mechanism  which  causes  the  data  to  be 
missing  (the  sampling  design  in  the  survey  context)  can,  for  a  wide  class  of 
situations,  be  ignored  for  Bayes  or  Likelihood  inference  but  not  for  classical 
sampling  distribution  theory  inference.  This  non-ignorability  of  the  missing 
data  mechanism  can  make  classical  inference  much  more  complicated  and  even 
impossible  if  the  mechanism  is  unknown. 

In  this  paper  we  apply  ideas  of  Barndorff-Nielsen  (1980),  for  a  class  of 
statistical  models  called  curved  exponential  families,  to  construct  ancillary 
statistics  for  some  missing  data  problems.  We  show  that  if  classical 
inference  is  carried  out  conditional  upon  the  observed  values  of  these 
ancillary  statistics  then  the  missing  data  mechanism  may  be  ignored,  in 
certain  situations.  Some  correspondence  between  these  conditional  procedures 
and  likelihood  methods  is  established.  The  approach  rests  heavily  on 
examples.  This  is  characteristic  of  the  conditioning  literature,  where 
specific  results  are  often  more  enlightening  than  attempts  at  generality. 


The  responsibility  for  the  wording  and  views  expressed  in  this  descriptive 
summary  lies  with  MRC,  and  not  with  the  author  of  this  report. 


CONDITIONING  IN  A  MISSING  DATA  PROBLEM 


C.  J.  Skinner 
1.  INTRODUCTION 

In  this  article  we  consider  the  possibility  of  conditioning,  using  either  exact  or 
approximate  ancillary  statistics,  in  a  problem  of  estimation  with  missing  data.  The 
approach  is  not  general  but  illustrative,  using  three  examples. 

Rubin  (1976)  showed  that  the  mechanism  which  causes  data  to  be  missing  may,  in  a  large 
class  of  situations,  be  ignored  for  Bayesian  or  Likelihood  inference  but  not  for  sampling 
distribution  inference.  We  shall  show  how  conditioning  can  enable  this  mechanism  to  be 
Ignored  for  a  wider  class  of  situations  under  sampling  distribution  inference.  Essentially 
we  show,  as  in  Efron  and  Hinkley  (1978),  that  the  conditional  distribution  of  the  maximum 
likelihood  estimator  (MLE)  given  an  approprite  ancillary  corresponds,  at  least 
approximately,  to  the  use  of  the  inverse  of  the  observed  Fisher  information  matrix  to 
measure  the  dispersion  of  the  MLE.  The  latter  procedure,  being  purely  likelihood  based, 
shares  the  ignorability  properties  of  Likelihood  inference.  To  provide  initial  motivation 
for  the  form  of  the  conditioning  procedure  we  cospare  our  estimation  problem  with  a 
prediction  problem  in  survey  sampling. 

We  assume  the  pairs  (y^,x^),  i  <*  1,...,N  form  a  random  sample  from  a  bivariate 
distribution  p(y,xi<i)  belonging  to  a  family  indexed  by  the  (generally  vector)  parameter 
(> .  We  do  not  observe  the  complete  data  dg  “  (y  .  ,yN,  x1r...,xN)  but  only  the 
incomplete  data  dj  “  (y^^  ,...,yi  ,  s,  Xj,...,xN)  where  s  *  {i1,...,in}  is  a  subset  of 
size  n  from  U  »  {1,...,N},  n  <  N.  We  assume  that  s,  and  hence  dj,  is  obtained 
from  dg  by  a  selection  mechanism  which  assigns  probabilities  p(s|xg),  possibly 
dependent  on  xc  *  (Xj,...,xN)  but  not  on  p,  to  selecting  each  of  the  (^)  possible 
subsets  s . 

The  above  set-up  occurs,  for  example,  in  survey  sampling  where  a  sample  s  is 
selected  from  a  finite  population  U,  an  auxiliary  variable  x  is  known  for  each  unit 
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i  in  0  and  ia  available  for  uae  in  the  selection  of  a  and  where  y  is  a  variable 
neasured  in  the  survey.  The  mechanism  p(s|xc)  is  called  the  sample  design  in  this 
context . 

Consider  two  problems: 

-  -1  K 

(A)  the  prediction  of  a  function  of  (yp>>wy|)),  specifically  y  ■  N  £  , 

(B)  the  estimation  of  a  parameter  of  the  marginal  distribution  of  y,  specifically 
Uy  ■  *(y>- 

In  the  sample  survey  context,  (A)  is  the  traditional  descriptive  use,  usually 
concerned  .. ith  means  and  totals,  whilst  (B)  is  the  analytical  use,  usually  concerned  with 
the  estimation  of  underlying  models  such  as  regression  models  (see  e.g.  Hartley  and 
Sielken,  1975).  Our  consideration  of  only  y  and  |jy  may  be  taken  as  special  cases  of 
this  more  general  use. 

Under  the  sampling  distribution  approach  to  inference,  it  is  usual  and  natural  in  (A) 
to  make  predictive  inference  about  y  conditional  on  (s,xq).  Prediction  proceeds  as  in 
the  usual  regression  context  where  (x^y^),  i  e  s  are  known  and  x^,  i  /  s  are  new  x 
values  for  which  we  wish  to  predict  y.  More  formally,  note  that  the  conditional 
distribution  of  <%  given  d z  depends  only  on  the  parameter  f  which  indexes  the 
conditonal  distribution  of  y  given  x.  Suppose  we  may  write  t  •  (g,X),  where 
P(y,x»'M  “  p(y |xiv)p(xiX),  so  that  the  distribution  of  the  data  is 

p(dEit)  -  p(y,  ,"',yL  I  s,xc>v)p(s|xc)p(xc:X) 

1  n 

N 

“  t  n  p(y, |x. >»)]p(s|x  )  n  p(x, * x )  . 

i«  1  1  c  i«i  11 

Then,  provided  g  and  X  are  variation  independent,  it  may  be  argued  (Cox  and 
Hinkley,  1974,  p.  35:  Barndorff-Nielsen,  1978,  p.  50)  that  (SrXQ)  is  ( extended- iS-> 
ancillary  for  f,  treating  X  as  a  nuisance  parameter,  and  that  inference  about  f  and 
hence  the  prediction  of  y  should  be  made  conditional  on  (s,xc).  (More  precisely  we 
condition  on  a  minimal  sufficient  reduction  of  (SrX^.),  see  Section  4.) 


For  the  estimation  of  yy  in  (B),  however,  the  same  argument  does  not  apply. 

Although  (s,xc)  may  be  ancillary  for  9,  the  parameter  of  interest  y^  is  in  general  a 
function  not  only  of  9  but  also  of  X  and  will  not  be  identifiable  in  the  conditional 
distribution  of  dr  given  (s,xc.).  Hence  inference  about  y  conditional  on  (s,Xr.)  is 

y  ^ 

generally  inappropriate . 

Now  if  N  is  large,  and  in  the  sample  survey  context  N  may  be  very  large,  the 
difference  between  y  and  yy  may  be  very  small,  even  though  it  would  appear  from  the 
discussion  above  that  sampling  distribution  inference  about  these  two  quantities  could 
proceed  quite  differently.  This  apparent  'paradox'  provides  one  source  of  motivation  for 
seeking  an  alternative  procedure  for  conditional  inference  about  y^. 

In  Bayesian  or  Likelihood  inference  about  y^  the  selection  mechanism  only  enters  as 
a  multiplicative  factor  free  of  1 (1  in  the  likelihood  in  (1)  and  hence  is  ignorable.  In 
other  words,  making  the  false  assumption  that  s  is  derived  from  U  by  simple  random 
sampling  rather  than  by  the  true  mechanism  p(s|xc>,  would  make  no  difference  to  the 
inference  for  given  dj.  In  Rubin's  (1976)  terminology  this  is  because  we  have  assumed  the 
data  is  'missing  at  random'.  The  selection  mechanism  is  not,  however,  ignorable  for 
unconditional  sampling  distribution  inference  (nor  for  inference  conditional  on  s  but  not 
on  X(.  as  in  Rubin,  1976). 

In  Section  2  we  present  the  three  examples.  They  are  chosen  to  be  of  increasing  order 
of  complexity.  In  Example  1  there  is  an  exact  ancillary  and  the  conditional  distribution 
for  inference  about  yy  is  exactly  normal.  In  Example  2  there  is  an  exact  ancillary  but 
the  conditional  distribution  is  only  asymptotically  normal.  In  Example  3  there  is  only  an 
approximate  ancillary. 

In  Section  3  we  consider  the  prediction  of  y  conditional  on  (s.x^) .  In  Section  4 
we  consider  the  estimation  of  y^  on  the  assumption  that  a  is  a  simple  random  sample 
from  U.  He  adopt  the  approach  of  Barndorff -Nielsen  (1980)  to  determine  appropriate 
conditioning  procedures.  These  are  then  compared  with  Section  3.  In  8ection  5  we  compare 
the  conditional  procedures  with  the  use  of  asymptotic  maximum  likelihood  theory  using 
dispersion  estimates  based  on  either  observed  or  expected  Fisher  information  matrices.  In 
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Section  6  we  consider  conditioning  under  a  general  selection  machanii 
discuss  in  what  sense  this  mechanism  is  ignorable. 


p(s|xc)  and 


In  asymptotic  arguments  we  shall  assume  that  n  indexes  fixed  sequences  N  -  and 
p(s|xc)  =  pn(s|xc)  and  that  n/Nn  ♦  f,  a  constant,  as  n  ♦  «•.  Notation  will  be  of  three 
types.  The  key  quantities  9,  f,  X,  uy,  t,  a  are  defined  generally  but  take  different 
forms  in  the  different  examples.  The  remaining  statistics  such  as  xg,  x,  are  defined  in 
terms  of  dj  and  are  invariant  with  respect  to  the  example.  The  remaining  parameter* , 
such  as  a,  0  and  y,  are  example-specific. 


2.  TBS  EXAMPLES 


Example  1 

He  assume  (y,x)  is  bivariate  normal.  The  parameter  vector  is  9  “  (f,X)  where 

2  2  2  2 
V  -  («,0,cy>x),  X  -  ( Wx , cx )  and  y|x  ~  M(a  +  0x,  ay.x)»  x  - 

The  parameter  of  interest  is  -  a  +  6lix-  The  MLE  of  9  is  (Anderson,  1957) 

9  *  (y  -  bjc,b,n1SS  ,1c,  N_1SS  ) 

•  S  s'  s'  V*XS  '  X 

_  — 1  _  —  —  1  n 

where  y#  -  n  J  Yi'  x*  ■  n  J  x^,  x  -  N  1  \  • 

—  2 

b,  -  I  Yi<*i  "  M.)/S8Xs'  8Sxg  •  l  (Xj  -  xB)  , 

SVx.  “  l  <yi  "  °  *  *  V2'  "x  ml  {\~  *>2  ' 

A  A  A 

Hence  the  MU  of  yy  is  p^«a  +  Jx*y(  +  b^lx  -  x^)  which  in  survey  sampling  is 
termed  the  regression  estimator  (Cochran,  1977).  A  minimal  sufficient  statistic  for  9  is 
t  »  <ys»xs»b-»88xs*8Sy,xs»x»ssxi*  Neither  9  nor  t  depend  on  the  selection  mechanism 
because  p(s|xc)  is  a  multiplicative  factor  free  of  9  in  (1).  The  family  of 
distributions  for  dj  indexed  by  9  is  a  curved  exponential  family,  labelled  (7,5)  by 
Barndorff-Nielsen  (1980)  since  dim(t)  ”  7,  dim (9)  -  5. 


We  assume  that 


(i 


y|x  ~  N(yx,o  x)  ,  x  ~  Gamma  (A,k)  ,  k  known 

.  p(x;A)  ■  A*St*t”1e”**Vnk) ) .  The  parameter  vector  is  i>  *  (?»X)  where  v  “  tY»a2)- 


The  parameter  of  interest  is  *  yk/A.  The  MLE  of  ♦  le 


*  —  —  -1  *  2  — 

♦  -  (y^/Xs  «  n  I  <yx  "  Y  *£>  »  */x> 


Hence  the  MLB  of  u  is  y  *  ygx/x>t  the  ratio  estimator  (Cochran,  1977).  A  minimal 


sufficient  statistics  for  ♦  is  t  -  (y^ ,xg ,£  y2/x^ ,x) .  The  family  of  distributions  for 


dx  indexed  by  +  is  a  (4,3)  curved  exponential  family. 


Example  3 

We  assume  that  y  and  x  are  both  0-1  variables  with 

Pr(y»1  |x«0)  «  e q,  Pr (y-1 1 x-1 )  “  <p1(  Pr(x«1)  -  A  . 

The  parameter  vector  Is  ♦  “  (f,A)  where  f  ■  (Sq.V i ) •  The  parameter  of  Interest  is  yy  “ 
Pr(y-I)  -  Ag.  ♦  ( 1-A)f g.  The  MLE  of  *  is  *  *>  <n1(/n.0'  nn/n.i'  N.  where  the  cell 

counts  in  s  and  U  are  defined  by 


na8  “  l  yi^-Vi’1"0  x5(1”xl)1‘B 


a,S  »  o,  1 


and  the  margins  are  defined  by 


n.B  “  E  noB'  N.0  *  ^  NaB 
a  a 


B  -  0,1 


The  MLE  of  Uy  is  thus 


*.0  n10  +  ».1 


■11 


y  H  n  Q  H  n#, 


a  poststratlflcatlon  estimator  (Cochran,  1977).  A  minimal  sufficient  statistic  for  <|>  is 
t  *  (n00,n10,n01,N>0) .  This  example  also  defines  a  (4,3)  curved  exponential  family. 
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3.  PREDICTION  OF  y 

A  natural  predictor  of  y  given  dj  is  the  regression  predictors 

n-1[  l  y,  *  l  *<y|*  -  *4  > I  *]  • 

i«  1  1*S  1  f“S 

A 

This  predictor  can  be  shown  to  be  identical  to  uy  in  each  of  our  three  examples. 
Under  conditions  obeyed  by  the  examples ,  this  predictor  is  the  minimum  variance  unbiased 
predictor  of  y  conditional  on  (a,xc)  (Skinner,  1983).  NS  now  evaluate  the  conditional 
distribution  of  uy  -  y  given  (s,xc)  for  each  of  the  examples. 

Example  1  (continued) 

He  obtain 

;y  -  y|*.xc  -  N(0,  (1  -  n/N  +  n*2)c2<x/n>  (2) 

where  a.  «  (x  -x)SS_  /'2  .  Note  that  it  is  not  necessary  to  condition  on  all  the  x, 

1  8  XS  * 

values  and  s  but  only  on  a^.  An  exact  ( 1-a)-level  conditional  confidence  interval  for 
y  given  a^  is 

uy  ±  ((1  -  n/N  +  na^)ssy#x/n(n-2)]^2tn_2(a/2) 

where  ty(u)  is  the  oth  point  of  Student's  t-distributlon  with  v  d.f.  Note  that  this 
interval  does  not  depend  on  the  selection  mechanism.  It  distinguishes  between  'good* 
samples  where  a1  is  small  and  hence  y  may  be  more  precisely  predicted  and  'bad'  samples 
where  a^  is  large  and  hence  y  is  more  poorly  predicted.  Such  a  distinction  is  an 
essential  aim  of  conditioning. 

Example  2  (continued) 

He  obtain 

yy  -  y|s,xc  ~  N(0,  [x(Nx  -  nx^ )/NxsIo2/n)  .  (3) 

An  exact  ( 1 -a) -level  confidence  level  for  y  conditional  on  (x8,x)  is 

Uy  ±  {[x(Nx  -  nxa)/Nx8Jc2/(n-1)},/2tn_1(a/2) 

In  this  example  a  'good'  sample  is  one  in  which  y  is  missing  for  small  x  values  so 
that  is  large. 
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Example  3  (continued) 

He  may  write 

Wy  -  V  -  s‘1(N.o/n.o  •  1)nio  +  N"1(N.i/n.i  •  1)nii  -  N_1  “10  -  N_1  "11 
where  n^  -  N^  -  n^  a,g  -  0, 1. 

Conditional  on  (s ,xc)  the  quantities  Nag,  Na 1f  nag  and  na1  are  fixed  and  the 
distribution  of  -  y  is  determined  by  the  independent  Binomial  distributions  of  n10, 
nn,  n1Q  and  n^  which  possess  parameters  (n,g,e0),  (nj,*,),  (Na0  -  n<0,»0)  and 

respectively.  An  exact  conditional  confidence  interval  for  y  appears 
intractable.  Instead,  suppose  that  0  <  X,  <  1  and  that  the  sequence  of  designs  is 

such  that  n  g/n  is  almost  surely  bounded  away  from  0  and  1  as  n  ■*■  ».  Then  almost 
surely 


n/2  (yy  -y)|a,xc  &  N(0,  (1  -  fa1)(1-X)f0(1-*0)/a1 

+  n  -  fa2)X  v1(1-e1)/a2) 


(4) 


where  f  -  lim(n/N),  a^  »  n>0H/(nN>g),  a2  “  naiN/(nNai>. 

A  large  sample  conditional  confidence  interval  may  therefore  be  obtained  substituting 

A 

V  for  g. 


4.  ESTIMATION  OF  Uy  -  MISSING  VALUES  SELECTED  BY  SIMPLE  RANDOM  SAMPLING 
In  general  is  not  identified  in  the  conditional  distribution  of  dj  given 

(s,xc).  For  example,  the  transformation  vy  *  vy  *  vx  *  vx  +  'r  *  fixed  leaves 
p(dj | a,xc»'t>)  unaffected  in  Example  1.  Hence  conditioning  on  (s ,xc)  as  in  Section  3  is 
inappropriate.  Instead  we  seek  what  Barndorff-Nielsen  (1980)  term  a  'conditionality 
resolution',  that  is  we  seek  an  exact  or  approximate  ancillary  statistic  a  such  that 

A 

(<|i,a)  is  a  one-to-one  transformation  of  the  minimal  sufficient  statistic  t.  Ihe 
conditional  distribution  of  ^  given  a  is  then  the  appropriate  one  for  inference.  For 

A 

a  (k,d)  curved  exponential  family  we  seek  a  (k-d) -dimensional  ancillary.  Whilst  4> 
and  t  are  unaffected  by  the  missing  data  mechanism,  the  ancillarity  of  any  statistic  a 
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may  be  affected.  Hence,  in  this  section  we  suppose  that  s  is  obtained  from  U  by  simple 


random  sampling  of  fixed  size  so  that  {x^ji-1, .. .,N)  are  IID  whether  or  not 
i  t  s.  He  consider  the  general  mechanism  in  Section  5. 

Example  1  (continued) 

For  this  (7,5)  curved  exponential  family  we  seek  a  two-dimensional  ancillary.  Since 
the  normal  family  is  a  location-scale  tranformation  family  the  following  function  of  t: 

a  -  (ava2)  ”  [  (x^-xJ/SS'*  ,  SSX,/SSX] 

is  an  exact  ancillary,  in  the  sense  that  its  distribution  is  free  of  f.  Note  that  the 
tranformation  t  to  (4,a)  is  one-one. 

The  distribution  of  f  given  a  is  in  fact  tractable  although  we  shall  only  be 
concerned  with  the  distribution  of  yy  given  a.  Corresponding  to  (2)  we  have 

Uy|s,xc  ~  N(a  +  Bx,  (1  +  na^)cy#x/n)  .  (5) 

But  x  is  independent  of  a  (since  for  example  a  is  a  function  of  -  x1r 
i  ■  2f eee/N)  and  so 

xja  ~  N(ux,ox/H)  .  (6) 

Hence 

yja  ~  <1  +  mN!,,/”  +  bV/N)  .  (7) 

This  provides  an  appropriate  conditional  distribution  for  inference  about  yy.  In 
fact  we  only  need  to  condition  on  a ^  and  from  (2)  this  permits  us  to  apply  the  same  level 
of  conditioning  in  the  estimation  of  yy  as  in  the  prediction  of  y.  In  the  sense  of 
Section  1  we  have  therefore  resolved  the  'paradox'  of  different  levels  of  conditioning  for 
the  estimation  and  prediction  problems. 

It  only  appears  possible,  however,  to  obtain  a  large-sample  (rather  than  exact) 
conditional  interval  for  uy ,  since  no  pivot  based  on  yy  -  yy  seems  to  exist. 

Example  2  (continued) 

For  this  (4,3)  curved  exponential  family  we  seek  a  one-dimensional  ancillary.  Since 
X  is  a  scale  parameter,  an  exact  ancillary  is  given  by  a  •  x/x^.  The  transformation 

A 

t  ♦  (<|i, a)  is  one-one.  It  may  be  shown  that  a  is  independent  of  x.  Hence  the  exact 


N'Q/H  is  independent  of 


order  to  conditioning  on  (a  •  Also  in  the  limit  as  n  +  “, 

*aff  *  Thus 

n/2(H  0/N  -  X)|alfa2  *  N[0,  fX(l-X)] 

and  as  in  (4),  almost  surely 

"/2  [»*y  *  <N.0’0  +  H.1*1»/*,1lal'a2'".0/K  * 

NIO,  f0(1-e0)H  0/Nat  +  ffd-ffHI-M.o/tO/a,]  • 


(12) 


(13) 


Hence,  almost  surely 

n/2<Wy-Vy>l*aff  *  N[0'  <1*X>10(,*»o>/»,  ♦  Xfl(1-fl)/a2 

+  (fo-f^Xd-X)*]  .  (14) 

Note  that  the  level  of  conditioning  is  again  less  than  for  the  prediction  of  y, 

A 

since  in  (4)  the  conditioning  is  not  only  on  a ^  and  aj  but  also  on  N^g/N  -  1-X . 


5.  OBSERVED  VERSOS  EXPECTED  FISHER  INFORMATION 
The  asymptotic  theory  of  maximum  likelihood  estimation  for  the  regular  IID  case  may  be 
extended  to  the  incomplete  data  structure  of  dj  (c.f .  Hocking  and  Ssiith,  1968). 

A 

According  to  this  theory,  two  estimates  of  the  (asymptotic)  covariance  matrix  of  ♦  are 
*  -1  *  -1  *  * 

j(4>)  and  !(♦)  where  J ( <t> )  and  i (<i )  are  the  observed  and  expected  Fisher 
information  matrices  respectively! 

i(*)  -  E[j(*)J  ,  j<+>  -  -32log  p(dI»4-)/3*3*T  . 

Efron  and  Hinkley  (1978)  show  that  j (^i)”1  is  often  a  good  approximation  to  the 
conditional  variance  of  ♦  given  an  appropriate  ancillary,  a.  Barndorff-Nielsen  (1980) 

A 

considers  the  extension  to  the  multi-parameter  case.  He  now  compare  var(y^|a),  as 

A  A 

obtained  from  Section  4,  with  the  estimates  derived  from  i(f)  and  j (f ) ,  which  are 

vobs(V  "  •'(♦>TJ(i)’V(i) 
ve*p*V  “  *-WTiW“V(i» 

where  uy  m  g(*)  r  g*  (♦)  “  3g(<>)/3t  . 
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6.  ESTIMATION  OF  Jl^-GENERAL  MISSING  DATA  MECHANISM 
The  mechanism  p(s|xc)  does  not  affect  ♦  or  t  but  may  affect  the  ancillarity  of 
the  statistics  a  discussed  in  Section  4.  He  shall  first  show  that  the  exact  anciliaries 
given  for  Example  1  and  2  remain  exactly  ancillary  for  a  broad  class  of  mechanisms  and  that 
the  distribution  u^|a  is  also  invariant.  He  then  consider  the  more  general  problem. 


Example  1  (continued) 


Condition  Cl »  the  selection  mechanise  depends  only  on  the  confi9uretion 
«  ■  ((x3-x1)/(x2-x1),  (x4-x1)/(x2-x1),...,(xN-x1)/(x2-x1>]  of  xc. 

In  other  words,  under  Cl,  p ( s | xc )  is  inverient  with  respect  to  location  or  scale 
changes  in  x.  This  condition  holds  for  e  variety  of  sample  designs,  for  example  in 
stratified  random  saapling  when  strata  are  determined  by  quantiles  of  Xg  and  in  truncated 
sampling  where  the  point(s)  of  truncation  are  quantiles  of  Xg. 

Under  Cl,  a  remains  ancillary.  For  a  is  a  function  of  s  and  s.  The 
distributions  p(s|s)  “  p(s|xg)  and  p(s)  are  free  of  p.  Hence  the  distribution  of  a 
is  free  of  Furthermore  x  is  independent  of  s  and  is  conditionally  independent  of 

s  given  s.  Hence  x  is  independent  of  (s,s)  and  therefore  of  a.  Thus  (5)  and  (6) 
still  apply  and  the  distribution  |a  is  again  given  by  (7). 

Example  2  (continued) 

Condition  C2t  the  selection  mechaniem  depends  only  on  the  ratioe 
w  -  (x2/x1,x3/x1,...,xN/x1). 

In  other  words,  under  C2,  p(s|xc)  is  invariant  with  respect  to  acale  changee  of 
x.  This  condition  holds,  for  example,  with  probability  proportional  to  sise  designs, 
where  x  is  a  sise  measure,  which  are  often  used  in  conjunction  with  the  ratio  estimator. 

Under  C2,  a  remains  ancillary  by  a  similar  argument  to  that  in  Example  1.  Also  it 
may  be  shown  that  x  is  independent  of  (s,w)  and  therefore  of  a  and  hence  that  the 
expressions  (8),  (9)  and  (10)  remain  valid. 

We  now  turn  to  the  general  situation  where  the  statistics  a  defined  in  9ection  4 
need  no  longer  be  ancillary,  even  approximately,  we  begin  by  attempting  to  construct 
ancillaries  which  may  now  depend  on  the  mechanism  p(s|xg).  We  could  consider  the  affine 
ancillary  but  a  slightly  modified  approach  simplifies  the  distribution  theory  for  uy |a. 

A 

Let  be  the  MLK  of  X  were  only  x^,  i  8  s,  to  be  observed,  so  for  our  three  examples 
x#  “  (xg,n  SS^),  k/xg  and  n^/n  respectively.  Let  T(X)  ■  E(Xg|X).  In  each  of  our 

A  A 

examples  X  is  sufficient  for  X,  and  hence  T(X),  although  critically  dependent  on 


p(s|xc)r  is  free  of  X.  We  assume  that  the  sequence  of  designs  is  such  that  T(«)  is 


continuous  at  X  and  that 

n/2  -  T(X )  ]  £  N[0,  k(X)l 

where  MX)  ie  a  finite  positive-definite  matrix.  this  seems  a  fairly  weak  condition 
although ,  for  example,  it  would  exclude  the  stratified  design  in  Example  3  where  n>t  is 
set  at  a  fixed  fraction  of  1  so  that  k(X )  -  0.  We  adopt  as  an  approximate  ancillary 

a  -  n/2  (X^  -  r(X)iy(X)"  1/2  .  (15) 

Note  that,  when  the  missing  values  an  randomly  selected,  a  reduces  to  the  ancillary 

in  Section  4  for  Example  3  and  is  asymptotically  equivalent  to  the  ancillaries  for  Example 
*  “  *  Vi  * 

1  and  2.  Since  cov(Xg  -  T(X),X)  -  0,  n  *  ( X-X )  will  be  asymptotically  independent  of 

a 

a.  Now  the  conditional  distribution  of  )iy  given  (s,xc)  is  unaffected  by  selection  and 
in  each  of  our  examples  we  stay  write  (almost  surely) 

n^2liy  -  f0(X,g)l|s,xc  *  N[0,  f^X.X^,*)]  (16) 

where  fQ  and  f ^  are  certain  functions,  continuous  at  X  “  X,  such  that  f0(X,g)  »  uy  • 

»  A  A 

Now  define  f2  such  that  f^X.X^,*)  ”  fj(X,a,f).  Then 

n/2  lHy  -  f0(X,f))|a,X  i  N(0,  f2(X,a.?)] 

n/2(X-X)|a  £  NfO,  f3(X)] 

so  that  n/z  ( yy-py ) | a  £  N(0,  f2(X,a,<p)  +  f4<*))  where 

f4(  +  )  -  lim  n  var[fQ(X,e)]  . 
n+« 

V,  * 

The  estimated  asymptotic  variance  of  n  * ( uy~Uy )  given  a  is  thus 

V  -  f2(X,a,i)  +  f4(i)  . 


A  A  AAA 

But  from  (16)  f4(«)  and  f2<X,a,f)  ■  fj(X,Xg,g)  are  unaffected  by  the  missing  data 

mechanism  and  hence  this  mechanism  is  ignorable  for  conditional  inference*  In  other  words, 
if  we  supposed  incorrectly  that  the  missing  values  were  randomly  selected  so  that  x^, 
i  e  a  is  distributed  identically  to  Xp  i  |(  s  we  would  obtain  the  same  V. 

For  a  given  mechanism  the  quantity  T(A)  in  (15)  and  hence  the  ancillary  a  may  be 


very  complicated  to  compute.  In  practice,  however,  this  is  unnecessary.  We  showed  in 


'.VS 


.1*  * 

Section  4  that  n  V  ms  equal  to  v^fVy)  (with  n/N  replaced  by  f)  for  our 
examples.  Hence  for  inferential  purposes  we  only  require  the  straightforward  confutation 

A  A 

of  Wy  end  vob>(y  which  do  not  depend  on  p(s|x£> • 

Note  in  contrast  that  the  estimation  of  the  unconditional  variance  of  v'ntjiy-Vy )  or 
the  evaluation  of  vexp^y>  does  depend  on  pfe^)  and  can  be  quite  intractable.  This 
provides  a  further  practical  advantage  of  conditioning. 

7.  CONCLUSION 

He  have  indicated  how  exact  or  approximate  conditioning  arguswnts  may  be  applied  in 
three  examples  of  a  missing  data  problem.  Conditioning  is  attractive  here  for  several 
reasons:  (1)  it  can  permit  the  mechanism  which  causes  the  data  to  be  missing  to  be 
ignored,  (2)  it  can  lead  to  more  tractable  procedures,  (3)  it  makes  inference  more  'data- 
dependent1  (Fisher's  original  aotivation) . 

The  results  of  this  article  are  specific  to  the  examp les  chosen  although  some  possible 
generalisation  is  suggested  in  Section  6  for  models  where  t  is  a  1-1  transformation 
of  (*.*,)•  In  this  case  the  asymptotic  maximum  likelihood  approach  using  the  observed 
information  matrix  corresponds,  under  certain  conditions,  to  conditioning  on  the  ancillary 
defined  in  (15). 


I  am  grateful  to  D.  R.  Cox  for  fundamental  suggestions. 
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